Take-Home Excercise 01

Author

Mayuri Salunke

Published

January 30, 2023

Modified

February 12, 2023

1.0 Overview

Water is an essential part of our life, without which we won’t be able to survive for more than 3 days. Living in Singapore, we have access to clean drinkable water 24/7 and as a result we don’t realize the struggles of people who don’t have access to clean water at all. An an example of such are the people in Nigeria. Despite 70% of Nigerians having access to basic water services, more than half of them are contamintated (Reference).

For this assignment, we will be focusing on the State of Osun in Nigeria. Osun, located in the southwestern Nigeria is bounded to the east by Ekiti and Ondo states, Kwara on the north, Ogun to the south and to the west by Oyo State (Reference). Their economy is mainly based on the agriculture and it inhibits the Osun River, a sacred river. However, in the recent years, the river has been polluted by the several mining activities from the surrounding communities (Reference). Hence, it is integral for us to address the issue of providing clean and sustainable water to the people of Osun. Through this assignment, I aim to apply the relevant spatial point pattern analysis learned in class to analyse the Functional and Non-Functional water points in State of Osun, Nigera.

Osun River

2.0 Setup

2.1 Packages Used

  • sf : Used for importing geospatial data, assigning or transforming coordinate systems, and converting geospatial and aspatial data into a sf data frame

  • tidyverse : Used for transforming and better presentation of Data

  • tmap : Used for plotting static point patterns maps or interactive maps

  • spatstat : Used for point-pattern analysis

  • raster : Used to read, write, manipulate, analyse and model gridded spatial data

  • maptools : Used to provide a set of tools for manipulating geographic data

  • kableExtra : Used for table customization

  • funModeling : Used to data cleaning, importance variable, analysis and model performance

  • sfdep :Used for functions creates not present in spdep.

pacman::p_load(sf, maptools, raster, spatstat, tmap, kableExtra, tidyverse, funModeling, sfdep)

2.2 Datasets Used

The below diagram shows the datasets used for the Assignment. We have two types of data - geospatial and aspatial.

For the Aspatial data, we are extracting the data from WPdx Global Data Repositories. The data source consists of two types of data - WPdx-Basic and WPdx+, for the purpose of this project, we will be using the WPdx+.

For the Geospatial data, we will be using the Nigeria Level-2 Administrative Boundary polygon features GIS data. There are two data source for this - Humanitarian Data Exchange (HDE) and geoBoundaries.

# initialise a dataframe of our geospatial and aspatial dataset details
datasets <- data.frame(
  Type=c("Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         
         "Aspatial"),
  
  Name=c("geoBoundaries-NGA-ADM2",
         "geoBoundaries-NGA-ADM2",
         "geoBoundaries-NGA-ADM2",
         "geoBoundaries-NGA-ADM2",
         "geoBoundaries-NGA-ADM2",
         "geoBoundaries-NGA-ADM2",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         "nga_admbnda_adm2_osgof_20190417",
         
         "WPdx"),
  
  Format=c(".dbf", 
           ".geojson", 
           ".prj", 
           ".shp", 
           ".shx", 
           ".topojson",
           ".CPG",
           ".dbf",
           ".prj",
           ".sbn", 
           ".sbx", 
           ".shp", 
           ".shp", 
           ".shx", 
          
           ".csv"),
  
  Source=c("[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
           "[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
           "[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
           "[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
           "[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
           "[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
           
          "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           "[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
           
           "[ WPdx Global Data Repositories](https://www.waterpointdata.org/access-data/)")
  )

# with reference to this guide on kableExtra:
# https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
# kable_material is the name of the kable theme
# 'hover' for to highlight row when hovering, 'scale_down' to adjust table to fit page width
library(knitr)
library(kableExtra)
kable(datasets, caption="Datasets Used") %>%
  kable_material("hover", latex_options="scale_down")
Datasets Used
Type Name Format Source
Geospatial geoBoundaries-NGA-ADM2 .dbf [geoBoundaries](https://www.geoboundaries.org/index.html#getdata)
Geospatial geoBoundaries-NGA-ADM2 .geojson [geoBoundaries](https://www.geoboundaries.org/index.html#getdata)
Geospatial geoBoundaries-NGA-ADM2 .prj [geoBoundaries](https://www.geoboundaries.org/index.html#getdata)
Geospatial geoBoundaries-NGA-ADM2 .shp [geoBoundaries](https://www.geoboundaries.org/index.html#getdata)
Geospatial geoBoundaries-NGA-ADM2 .shx [geoBoundaries](https://www.geoboundaries.org/index.html#getdata)
Geospatial geoBoundaries-NGA-ADM2 .topojson [geoBoundaries](https://www.geoboundaries.org/index.html#getdata)
Geospatial nga_admbnda_adm2_osgof_20190417 .CPG [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .dbf [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .prj [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .sbn [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .sbx [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .shp [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .shp [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Geospatial nga_admbnda_adm2_osgof_20190417 .shx [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)
Aspatial WPdx .csv [ WPdx Global Data Repositories](https://www.waterpointdata.org/access-data/)

3.0 Data Wrangling : Geospatial Data

3.1 Importing and Transforming Geospatial Data

We will begin by importing Geospatial data into R by using the st_read() of sf package. It imports the nga_admbnda_adm2_osgof_20190417 shapefile into R as a polygon data frame. We provide 2 arguments - dsn (which is the data path) and layer (the shapefile name)

We use the st_transform() to perform projection transaction.

geoNGA <- st_read("data/geospatial", 
                  layer = "geoBoundaries-NGA-ADM2")
Reading layer `geoBoundaries-NGA-ADM2' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84
NGA <- st_read(dsn = "data/geospatial", 
                  layer = "nga_admbnda_adm2_osgof_20190417")
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source 
  `C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

We can use the glimpse() of dplyr to know more about the associated attribute information of the dataframe.

glimpse(geoNGA)
Rows: 774
Columns: 7
$ shapeName  <chr> "Aba North", "Aba South", "Arochukwu", "Bende", "Ikwuano", …
$ pcode      <chr> "NG001001", "NG001002", "NG001003", "NG001004", "NG001005",…
$ level      <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ shapeID    <chr> "NGA-ADM2-13203401B25860527", "NGA-ADM2-13203401B76240303",…
$ shapeGroup <chr> "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NG…
$ shapeType  <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((7.387495 5...., MULTIPOLYGON (…
glimpse(NGA)
Rows: 774
Columns: 17
$ Shape_Leng <dbl> 0.2370744, 0.2624772, 3.0753158, 2.5379842, 0.6871498, 1.06…
$ Shape_Area <dbl> 0.0015239210, 0.0035311037, 0.3268678399, 0.0683785064, 0.0…
$ ADM2_EN    <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ ADM2_PCODE <chr> "NG001001", "NG001002", "NG008001", "NG015001", "NG003001",…
$ ADM2_REF   <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ ADM2ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM2ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1_EN    <chr> "Abia", "Abia", "Borno", "Federal Capital Territory", "Akwa…
$ ADM1_PCODE <chr> "NG001", "NG001", "NG008", "NG015", "NG003", "NG011", "NG02…
$ ADM0_EN    <chr> "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nig…
$ ADM0_PCODE <chr> "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG",…
$ date       <date> 2016-11-29, 2016-11-29, 2016-11-29, 2016-11-29, 2016-11-29…
$ validOn    <date> 2019-04-17, 2019-04-17, 2019-04-17, 2019-04-17, 2019-04-17…
$ validTo    <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ SD_EN      <chr> "Abia South", "Abia South", "Borno North", "Federal Capital…
$ SD_PCODE   <chr> "NG00103", "NG00103", "NG00802", "NG01501", "NG00302", "NG0…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((7.401109 5...., MULTIPOLYGON (…

From the attributes visible, we can see the HDE source (NGA) has a column called ‘ADM1_EN’ which can be used to filter for water points in Osun, Nigeria. However, this is not present in the geoBoundaries dataset. As a result, we will only be using the Humanitarian Data Exchange source - nga_admbnda_adm2.

However, NGA sf data.frame consists of many redundent fields. Hence, the code chunk below uses select() of dplyr to retain column 3, 4, 8 and 9.

NGA <- NGA %>%
  select(c(3:4, 8:9))

We then use the filter() to filter out the polygon features of Osun.

NGA <- NGA %>% filter(ADM1_EN == "Osun")

Now, we use the st_crs() to check the coordinate system of the data. As we can see, it uses the WGS 84 coordinate system. The data is using a Geographic projected system, however, this is system is not appropriate since we need to use distance and area measures.

st_crs(NGA)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Hence, we use st_transform() and not st_set_crs() as st_set_crs() assigns the EPSG code to the data frame. And we need to transform the data frame from geographic to projected coordinate system. We will be using crs=26392 (found from the EPSG for Nigeria).

NGA <- st_transform(NGA, crs = 26392)

Verify that the CRS of NGA dataframe has changed.

st_crs(NGA)
Coordinate Reference System:
  User input: EPSG:26392 
  wkt:
PROJCRS["Minna / Nigeria Mid Belt",
    BASEGEOGCRS["Minna",
        DATUM["Minna",
            ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4263]],
    CONVERSION["Nigeria Mid Belt",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",4,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",8.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.99975,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",670553.98,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Nigeria between 6°30'E and 10°30'E, onshore and offshore shelf."],
        BBOX[3.57,6.5,13.53,10.51]],
    ID["EPSG",26392]]

3.2 Data Pre-processing

3.2.1 Dropping Invalid Dimensions

Since, we only have one dataframe, there are no invalid dimensions, and hence, this step is not required.

3.2.2 Invalid Geometries

The st_is_valid() function checks whether a geometry is valid and returns the indices. Whereas, the length() gives you a count of the indices with invalid geometries.

length(which(st_is_valid(NGA) == FALSE))
[1] 0

None of the values are Invalid, so we are good to go!!

3.2.3 Checking for Duplicated Names

We need to check for duplicate name in the data main data fields. Using duplicated() of Base R, we can flag out LGA names that might be duplicated as shown in the code chunk below.

NGA$ADM2_EN[duplicated(NGA$ADM2_EN)==TRUE]
character(0)

There are no duplicated values, so we are good to go!

3.2.4 Initial Visualization

plot(st_geometry(NGA))

4.0 Data Wrangling : Aspatial Data

4.1 Importing Aspatial Data

Since the WPdx data is in CSV format, we will use read_csv() of readr package to import WPdx.csv. The output is called wp_nga and is a tibble dataframe

wp_nga <- read_csv("data/aspatial/WPdx.csv") %>%
  filter(`#clean_country_name` == "Nigeria" & `#clean_adm1` == "Osun")

4.2 Converting water point data into sf point features

Converting an aspatial data into an sf data.frame involves two steps.

First, we need to convert the wkt field into sfc field by using st_as_sfc() function. The function stores it in a tibble data format.

wp_nga$Geometry = st_as_sfc(wp_nga$`New Georeferenced Column`)
wp_nga
# A tibble: 5,557 × 71
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
    <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 429123 GRID3             8.02    5.06 08/29/… Unknown <NA>    <NA>    Tapsta…
 2  70566 Federal Minis…    7.32    4.79 05/11/… No      Protec… Well    Mechan…
 3  70578 Federal Minis…    7.76    4.56 05/11/… No      Boreho… Well    Mechan…
 4  66401 Federal Minis…    8.03    4.64 04/30/… No      Boreho… Well    Mechan…
 5 422190 GRID3             7.87    4.88 08/29/… Unknown <NA>    <NA>    Tapsta…
 6 422064 GRID3             7.7     4.89 08/29/… Unknown <NA>    <NA>    Tapsta…
 7  65607 Federal Minis…    7.89    4.71 05/12/… No      Boreho… Well    Mechan…
 8  68989 Federal Minis…    7.51    4.27 05/07/… No      Boreho… Well    <NA>   
 9  67708 Federal Minis…    7.48    4.35 04/29/… Yes     Boreho… Well    Mechan…
10  66419 Federal Minis…    7.63    4.50 05/08/… Yes     Boreho… Well    Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

Next, we use the st_sf() to convert the tibble data.frame into an sf object. It is also important for us to include the referencing system of the data into the sf object.

wp_sf <- st_sf(wp_nga, crs=4326)
wp_sf
Simple feature collection with 5557 features and 70 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4.032004 ymin: 7.060309 xmax: 5.06 ymax: 8.061898
Geodetic CRS:  WGS 84
# A tibble: 5,557 × 71
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
 *  <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 429123 GRID3             8.02    5.06 08/29/… Unknown <NA>    <NA>    Tapsta…
 2  70566 Federal Minis…    7.32    4.79 05/11/… No      Protec… Well    Mechan…
 3  70578 Federal Minis…    7.76    4.56 05/11/… No      Boreho… Well    Mechan…
 4  66401 Federal Minis…    8.03    4.64 04/30/… No      Boreho… Well    Mechan…
 5 422190 GRID3             7.87    4.88 08/29/… Unknown <NA>    <NA>    Tapsta…
 6 422064 GRID3             7.7     4.89 08/29/… Unknown <NA>    <NA>    Tapsta…
 7  65607 Federal Minis…    7.89    4.71 05/12/… No      Boreho… Well    Mechan…
 8  68989 Federal Minis…    7.51    4.27 05/07/… No      Boreho… Well    <NA>   
 9  67708 Federal Minis…    7.48    4.35 04/29/… Yes     Boreho… Well    Mechan…
10  66419 Federal Minis…    7.63    4.50 05/08/… Yes     Boreho… Well    Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

Like step 3.2 Data Pre-processing, we transform the projection from wgs84 to the appropriate projected coordinate system of Nigeria.

wp_sf <- wp_sf %>%
  st_transform(crs = 26392)

4.3 Data Wrangling for Water Data Point

Exploratory Data Analysis (EDA) helps to gain initial understanding of the data. The freq() of funModeling package is used to reveal the distribution of water point status visually.

freq(data = wp_sf,
     input = '#status_clean')

                     #status_clean frequency percentage cumulative_perc
1                       Functional      2319      41.73           41.73
2                   Non-Functional      2008      36.13           77.86
3                             <NA>       748      13.46           91.32
4      Functional but needs repair       248       4.46           95.78
5 Non-Functional due to dry season       151       2.72           98.50
6        Functional but not in use        63       1.13           99.63
7                        Abandoned        15       0.27           99.90
8         Abandoned/Decommissioned         5       0.09          100.00

The diagram shows that there are nine classes present in the ‘status_clean’ field. Hence, now we will be performing data wrangling tasks to create 3 data object - Functional, Non-Functional and Unknown.

We use rename() function from the dplyr package to rename the column from #status_clean to status_clean for easier handling in subsequent steps. select() is used to include status_clean in the output sf data.frame. We use the mutate() and replace_na() functions to re-code all the NA values in status_clean into unknown.

wp_sf_nga <- wp_sf %>% 
  rename(status_clean = '#status_clean') %>%
  select(status_clean) %>%
  mutate(status_clean = replace_na(
    status_clean, "unknown"))

4.3.1 Extracting Water Point Data

Now we are ready to extract the water point data according to their status.

The code chunk below is used to extract functional water point.

wp_functional <- wp_sf_nga %>%
  filter(status_clean %in%
           c("Functional",
             "Functional but not in use",
             "Functional but needs repair"))

The code chunk below is used to extract nonfunctional water point.

wp_nonfunctional <- wp_sf_nga %>%
  filter(status_clean %in%
           c("Abandoned/Decommissioned",
             "Abandoned",
             "Non-Functional due to dry season",
             "Non-Functional",
             "Non functional due to dry season"))

The code chunk below is used to extract water point with unknown status.

wp_unknown <- wp_sf_nga %>%
  filter(status_clean == "unknown")

Performing a quick EDA on the derived sfa.dataframes

freq(data = wp_functional,
     input = 'status_clean')

                 status_clean frequency percentage cumulative_perc
1                  Functional      2319      88.17           88.17
2 Functional but needs repair       248       9.43           97.60
3   Functional but not in use        63       2.40          100.00
freq(data = wp_nonfunctional,
     input = 'status_clean')

                      status_clean frequency percentage cumulative_perc
1                   Non-Functional      2008      92.15           92.15
2 Non-Functional due to dry season       151       6.93           99.08
3                        Abandoned        15       0.69           99.77
4         Abandoned/Decommissioned         5       0.23          100.00
freq(data = wp_unknown,
     input = 'status_clean')

  status_clean frequency percentage cumulative_perc
1      unknown       748        100             100

We can see from the map below, the proportion of functional and non-functional water is quite similar.

tmap_mode("view")
tm_shape(wp_functional) +
 tm_dots(col = "status_clean",
         pal = "blue",
         title = "Functional") +
tm_shape(wp_nonfunctional) +
 tm_dots(col = "status_clean",
         pal = "red",
         title = "Non-Functional") +
  tm_view(set.zoom.limits = c(8.5,15)) 

4.3.2 Performing Point-In Polygon Count

Next, we want to find out the number of total, functional, nonfunctional and unknown water points in Osun State. This is performed in the following code chunk. First, it identifies the functional water points in each LGA by using st_intersects() of sf package. Next, length() is used to calculate the number of functional water points that fall inside each LGA.

NGA_wp <- NGA %>% 
  mutate(`total_wp` = lengths(
    st_intersects(NGA, wp_sf_nga))) %>%
  mutate(`wp_functional` = lengths(
    st_intersects(NGA, wp_functional))) %>%
  mutate(`wp_nonfunctional` = lengths(
    st_intersects(NGA, wp_nonfunctional))) %>%
  mutate(`wp_unknown` = lengths(
    st_intersects(NGA, wp_unknown)))
NGA_wp
Simple feature collection with 30 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 176503.2 ymin: 331434.7 xmax: 291043.8 ymax: 454520.1
Projected CRS: Minna / Nigeria Mid Belt
First 10 features:
          ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE                       geometry
1        Aiyedade   NG030001    Osun      NG030 MULTIPOLYGON (((213526.6 34...
2        Aiyedire   NG030002    Osun      NG030 MULTIPOLYGON (((212542.6 40...
3  Atakumosa East   NG030003    Osun      NG030 MULTIPOLYGON (((265746.8 37...
4  Atakumosa West   NG030004    Osun      NG030 MULTIPOLYGON (((248871.4 40...
5      Boluwaduro   NG030005    Osun      NG030 MULTIPOLYGON (((266092.2 43...
6          Boripe   NG030006    Osun      NG030 MULTIPOLYGON (((255072.5 43...
7       Ede North   NG030007    Osun      NG030 MULTIPOLYGON (((236386.9 41...
8       Ede South   NG030008    Osun      NG030 MULTIPOLYGON (((236386.9 41...
9        Egbedore   NG030009    Osun      NG030 MULTIPOLYGON (((220756 4317...
10         Ejigbo   NG030010    Osun      NG030 MULTIPOLYGON (((214422.1 42...
   total_wp wp_functional wp_nonfunctional wp_unknown
1       389           157              154         78
2       175            89               57         29
3       223            98               92         33
4       246           111              103         32
5       129            63               51         15
6       177            79               85         13
7       216           141               50         25
8       146            72               39         35
9       142            63               44         35
10      434           274              126         34

We then visualise attributes by using statistcal graph. We use functions of ggplot2 package to reveal the distribution of total water points in Osun’s LGA using histogram.

ggplot(data = NGA_wp,
       aes(x = total_wp)) + 
  geom_histogram(bins=20,
                 color="black",
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(
    total_wp, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=0.8) +
  ggtitle("Distribution of total water points") +
  xlab("No. of water points") +
  ylab("No. of\nLGAs") +
  theme(axis.title.y=element_text(angle = 0))

4.4 Choropleth Mapping

We will be calculating the proportion of Functional and Non-Functional water points and mapping them to see which area of Osun has more proportions of functional and non-functional water points.

NGA_wp_total <- NGA_wp %>%
  mutate(pct_functional = wp_functional/total_wp) %>%
  mutate(pct_nonfunctional = wp_nonfunctional/total_wp)
tm_shape(NGA_wp_total) +
  tm_fill("pct_functional",
          n = 10,
          style = "equal",
          palette = "Blues",
          legend.hist = TRUE) +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Rate map of functional water point by LGAs",
            legend.outside = TRUE)
tm_shape(NGA_wp_total) +
  tm_fill("pct_nonfunctional",
          n = 10,
          style = "equal",
          palette = "Blues",
          legend.hist = TRUE) +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Rate map of non-functional water point by LGAs",
            legend.outside = TRUE)

5.0 Combined Data Wrangling : Geospatial & Aspatial Data

This is an essential step - the process of getting data from its raw input into a format which can be used for analysis.

5.1 Converting sf data frames to sp’s Spatial* Class

We use the as_spatial() function to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.

wp_functional_spatial = as_Spatial(wp_functional)
wp_nonfunctional_spatial = as_Spatial(wp_nonfunctional)
NGA_spatial <- as_Spatial(NGA)
NGA_spatial
class       : SpatialPolygonsDataFrame 
features    : 30 
extent      : 176503.2, 291043.8, 331434.7, 454520.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 4
names       :  ADM2_EN, ADM2_PCODE, ADM1_EN, ADM1_PCODE 
min values  : Aiyedade,   NG030001,    Osun,      NG030 
max values  :   Osogbo,   NG030030,    Osun,      NG030 
wp_functional_spatial
class       : SpatialPointsDataFrame 
features    : 2630 
extent      : 177285.9, 290751, 343128.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :              status_clean 
min values  :                Functional 
max values  : Functional but not in use 
wp_nonfunctional_spatial
class       : SpatialPointsDataFrame 
features    : 2179 
extent      : 180539, 290616, 340054.1, 450780.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :                     status_clean 
min values  :                        Abandoned 
max values  : Non-Functional due to dry season 

5.2 Converting from Spatial* classes to sp format

In order to use the spatstat for our analysis, we need our data to be in the ppp object form. Hence, we first need to convert them into Spatial object first and then into ppp object.

# convert into respective sp (in our case, either polygons or points)
wp_functional_sp <- as(wp_functional_spatial, "SpatialPoints")
wp_nonfunctional_sp <- as(wp_nonfunctional_spatial, "SpatialPoints")
NGA_sp <-as(NGA_spatial, "SpatialPolygons")
wp_functional_sp
class       : SpatialPoints 
features    : 2630 
extent      : 177285.9, 290751, 343128.1, 450859.7  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
wp_nonfunctional_sp
class       : SpatialPoints 
features    : 2179 
extent      : 180539, 290616, 340054.1, 450780.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
NGA_sp
class       : SpatialPolygons 
features    : 30 
extent      : 176503.2, 291043.8, 331434.7, 454520.1  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 

5.3 Converting from sp format to spatstat ppp format

We can’t convert SpatialPolygons to ppp format - nor is there any need to. Hence, we won’t be including our ‘base map’, NGA.

# from sp object, convert into ppp format
wp_functional_ppp <- as(wp_functional_sp, "ppp")
wp_nonfunctional_ppp <- as(wp_nonfunctional_sp, "ppp")

The below map shows the point patterns for both functional and non-functional water points.

par(mfrow=c(1,2))
plot(wp_nonfunctional_ppp)
plot(wp_functional_ppp)

5.3.1 Handling Duplicated Points + Jittering

any(duplicated(wp_functional_ppp)) 
[1] FALSE
any(duplicated(wp_nonfunctional_ppp)) 
[1] FALSE

Since there is no duplication, we don’t have to apply the process of Jittering.

5.4 Creating Owin Object

We need to now confine the analysis with a geographical area - Osun State and we do this by creating a object called owin which represent the polygonal region. The below code covert the SpatialPolygon (NGA_sp) created into an owin object.

NGA_owin <- as(NGA_sp, "owin")
plot(NGA_owin)

5.5 Combining point events object and owin object

In this step, we extract the functional and non-functional water points that are located within Osun, Nigeria. This combines both the point and polygon feature into one ppp object class.

wp_functional_ppp = wp_functional_ppp[NGA_owin]
wp_nonfunctional_ppp = wp_nonfunctional_ppp[NGA_owin]
par(mfrow=c(1,2))
plot(wp_nonfunctional_ppp)
plot(wp_functional_ppp)

6.0 Exploratory Spatial Data Analysis

In this section, we use the Hands-on Excercise 04 to help

  • Derive kernel density maps of functional and non-functional water points. Using appropriate tmap functions,

  • Display the kernel density maps on openstreetmap of Osub State, Nigeria.

  • Describe the spatial patterns revealed by the kernel density maps. Highlight the advantage of kernel density map over point map.

6.1 Kernel Density Estimation

6.1.1 Computing Kernel Density Estimation

There are two types of bandwidth methods - Fixed (Automatic) and Adaptive bandwidth method. These methods employ different uniform bases in density calculation.

Computing using Automatic Bandwidth selection method

We can compute the kernel density by using the bw.ppl() or bw.diggle(). As learned in Chapter 04, the ppl() method is prefered for patterns consisting predominantly of prominent clusters. Whereas, bw.diggle() is best used to detect a single tight cluster in the midst of random noise.

From the maps below, we can see that bw.ppl() method is better able to identify the prominent clusters as the data does not contain a single tight cluster. Hence, we will be using the bw.ppl() method.

bw.diggle() Method

kde_wpfunctional_bw_diggle <- density(wp_functional_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
kde_wpnonfunctional_bw_diggle <- density(wp_nonfunctional_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 


par(mfrow=c(1,2))
plot(kde_wpfunctional_bw_diggle,
     main = "Functional Water Points",
     ribside=c("right"))
plot(kde_wpnonfunctional_bw_diggle,
     main = "Non-Functional Water Points",
     ribside=c("right"))

bw.ppl() Method

kde_wpfunctional_bw <- density(wp_functional_ppp,
                              sigma=bw.ppl,
                              edge=TRUE,
                            kernel="gaussian") 
kde_wpnonfunctional_bw <- density(wp_nonfunctional_ppp,
                              sigma=bw.ppl,
                              edge=TRUE,
                            kernel="gaussian") 


par(mfrow=c(1,2))
plot(kde_wpfunctional_bw,
     main = "Functional Water Points",
     ribside=c("right"))
plot(kde_wpnonfunctional_bw,
     main = "Non-Functional Water Points",
     ribside=c("right"))

Computing using Adaptive Bandwidth selection method

kde_wpfunctional_adaptive <- adaptive.density(wp_functional_ppp, method="kernel")

kde_wpnonfunctional_adaptive <- adaptive.density(wp_nonfunctional_ppp, method="kernel")

par(mfrow=c(1,2))
plot(kde_wpfunctional_adaptive,
     main = "Functional Water Points",
     ribside=c("right"))
plot(kde_wpnonfunctional_adaptive,
     main = "Non-Functional Water Points",
     ribside=c("right"))

Comparing Automated and Adapting Bandwidth Methods (side-by-side)

Functional Water Point

par(mfrow=c(1,2))
plot(kde_wpfunctional_bw,
     main = "Functional Water Points - Automated",
     ribside=c("right"))
plot(kde_wpfunctional_adaptive,
     main = "Functional Water Points - Adaptive",
     ribside=c("right"))

Non-Functional Water Point

par(mfrow=c(1,2))
plot(kde_wpnonfunctional_bw,
     main = "Non-Functional Water Points - Automated",
     ribside=c("right"))
plot(kde_wpnonfunctional_adaptive,
     main = "Non-Functional Water Points - Adaptive",
     ribside=c("right"))

6.1.2 Rescalling KDE Values

As we can the KDE values are small (ranging from 0 to 0.000035). This is because the default unit of measurement of svy21 is in meter. As a result, the density values computed is in “number of points per square meter”. So rescale() is used to covert the unit of measurement from meter to kilometer.

wp_functional_ppp_km <- rescale(wp_functional_ppp, 1000, "km")
wp_nonfunctional_ppp_km <- rescale(wp_nonfunctional_ppp, 1000, "km")

Now we re-plot the graphs

Automated Bandwidth Method

kde_wpfunctional_km <- density(wp_functional_ppp_km,
                              sigma=bw.ppl,
                              edge=TRUE,
                            kernel="gaussian") 
kde_wpnonfunctional_km <- density(wp_nonfunctional_ppp_km,
                              sigma=bw.ppl,
                              edge=TRUE,
                            kernel="gaussian") 

par(mfrow=c(1,2))
plot(kde_wpfunctional_bw,
     main = "Functional Water Points",
     ribside=c("right"))
plot(kde_wpnonfunctional_bw,
     main = "Non-Functional Water Points",
     ribside=c("right"))

Adaptive Bandwidth Method

kde_wpfunctional_adaptive_km <- adaptive.density(wp_functional_ppp_km, method="kernel")

kde_wpnonfunctional_adaptive_km <- adaptive.density(wp_nonfunctional_ppp_km, method="kernel")

par(mfrow=c(1,2))
plot(kde_wpfunctional_adaptive,
     main = "Functional Water Points",
     ribside=c("right"))
plot(kde_wpnonfunctional_adaptive,
     main = "Non-Functional Water Points",
     ribside=c("right"))

For this assignment, we will be using the Automated Bandwidth method because it defines its base in geographical space, where as the Adaptive method defines it in population (Reference). As we learned in Chapter 04, Automated Bandwidth is very sensitive to highly skew distribution of spatial point patterns over geographical units (e.g - urban versus rural). However, since we don’t have highly skewed data (as seen in the distribution graph in 4.3.2 Performing Point-In Polygon Count), we can use Fixed/Automated Bandwidth method.

6.2 Converting KDE output into grid object

gridded_wpfunctional <- as.SpatialGridDataFrame.im(kde_wpfunctional_km)
gridded_wpnonfunctional <- as.SpatialGridDataFrame.im(kde_wpnonfunctional_km)

spplot(gridded_wpfunctional)

spplot(gridded_wpnonfunctional)

6.2.1 Converting Gridded Output into Raster

Next, we convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.

kde_wpfunctional_raster <- raster(gridded_wpfunctional)
kde_wpfunctional_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -4.99773e-16, 10.55944  (min, max)
kde_wpnonfunctional_raster <- raster(gridded_wpnonfunctional)
kde_wpnonfunctional_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -2.52505e-16, 9.25861  (min, max)

6.2.2 Assigning Projection Systems

projection(kde_wpfunctional_raster) <- CRS("+init=EPSG:26392 +datum:WGS84 +units=km")
kde_wpfunctional_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs 
source     : memory
names      : v 
values     : -4.99773e-16, 10.55944  (min, max)
projection(kde_wpnonfunctional_raster) <- CRS("+init=EPSG:26392 +datum:WGS84 +units=km")
kde_wpnonfunctional_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs 
source     : memory
names      : v 
values     : -2.52505e-16, 9.25861  (min, max)

6.3 Kernel Density Maps on OpenStreetMap

Now, as the assignment requirements has specified, we should plot our kernel density maps on OpenStreetMap. Since we’ll be plotting a lot of kernel density maps, let’s create a function:

density_map <- function(raster_object, map_title) {
  tmap_mode("view")
  tm_basemap("OpenStreetMap") +
tm_shape(raster_object) +
  tm_raster("v", alpha=0.9) + 
  tm_layout(legend.position = c("right", "bottom"), 
            legend.height = 0.5, 
            legend.width = 0.4,
            main.title = map_title,
            main.title.position = 'center',
            main.title.size = 1,
            frame = TRUE) + 
  tm_view(set.zoom.limits = c(8, 13))
  } 
kde_wpfunctional_density_map <- density_map(kde_wpfunctional_raster, map_title = "Functional Water Points in Osun State")
kde_wpnonfunctional_density_map <- density_map(kde_wpnonfunctional_raster, map_title = "Non-Functional Water Points in Osun State")

Functional Density Map

kde_wpfunctional_density_map

Non-Functional Density Map

kde_wpnonfunctional_density_map

Functional Density Map

tmap_mode('plot')
tm_basemap("OpenStreetMap") +
tm_shape(kde_wpfunctional_raster) +
  tm_raster("v")

Non-Functional Density Map

tmap_mode('plot')
tm_basemap("OpenStreetMap") +
tm_shape(kde_wpnonfunctional_raster) +
  tm_raster("v")

6.4 Kernel Density Maps Analysis

As we can see in the map in 4.3 Data Wrangling for Water Data Point, the number of functional and non-functional water points are quite close, with there being 2319 Functional and 2008 Non-functional water points. As a result, both the density maps are similar. But what is interesting to note is that, there seems to be more concentrated density points for the Non-Functional water points, compared to the Functional water points. Further, from the density maps above, we can see that both the Functional and Non-Functional water points are relatively more concentrated in the center and the upper part of Osun and we don’t see that many water points in lower part of Osun. This patter is reflected in the map in [[5.5 Combining point events object and owin object]], where we see less concentration of points in the lower part of Osun.

6.5 Advantage of Kernel Density Map over Point Map

To understand the advantage of Kernel Density Map over Point Map, we first need to plot the two and compare the differences.

tmap_mode("plot")
tm_shape(NGA_wp) +
  tm_borders(alpha = 0.5) +
  tmap_options(check.and.fix = TRUE) +
tm_shape(wp_nonfunctional) +
  tm_dots(col="red", size=0.05) +
  tm_layout(main.title = "Non-Functional Water Points",
          main.title.position = "center",
          main.title.size = 1.2,
          frame = TRUE)

kde_wpnonfunctional_density_map

With the Kernel Density Map, denser areas with a heavier distribution of Non-Functional Water Points are easily spotted. This is because the kernel density z-estimate helps to smooth out the points in a given area. Compared to the point map which just shows the points. Further, the gradient colour available (ranging from yellow to green) helps in understanding the density/concentration of water pumps in the area. It clearly shows the viewer which are the areas with more non-functional water pumps, however, with the point map, the users have to gauge/estimate which are the denser with more non-functional water points.

Further, another advantage of Kernel Density Maps is that it uses the Inverse Distance Weighed method which is estimating cell values by using a linearly weighted combination of a set of sample points (Reference). This is quite useful, because it takes into consideration of water points that are further away (from the residential area) and hence, people might have to travel further to access the water points. And this is accounted by the kernel function, and not the Point Map.

Hence to conclude, the Kernal Density provides a quantitative value representing the concentration of points, where as this can only be observed/gauged in Point Map.

7.0 Second-order Spatial Point Patterns Analysis

For Functional Water Point in Osun Sate

  • H0: The distribution of the Functional Water Points are randomly distributed

  • H1: The distribution of the Functional Water Points are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

For Non-Functional Water Point in Osun Sate

  • H0: The distribution of the Non-Functional Water Points are randomly distributed

  • H1: The distribution of the Non-Functional Water Points are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

7.1 Analysing Spatial Point Process Using G-Function

We will be using the G-Function for analyse the spatial point process. This function deals with the cumulative distribution of the nearest neighbor distances. It computes the nearest neighbor distance for each event, which is then sorted from smallest to largest. This is used to construct a cumulative distribution.

7.1.1 Functional Water Point

Computing G-function estimation

G_wp_functional = Gest(wp_functional_ppp, correction = "border")
plot(G_wp_functional, xlim=c(0,500))

Performing Complete Spatial Randomness Test

G_wp_functional.csr <- envelope(wp_functional_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.

Done.
plot(G_wp_functional.csr)

Conclusion: The gray band shows for every distance, the smallest value and the largest value for of G(r) that is obtained out of 1000 simulations. The observed G(r) is far above the G(theo) as well as the envelope - indicating that Functional Water Points are clustered. Hence, we reject the null hypothesis that Functional Water Points are randomly distributed at 99% confident interval. Since the G(r) is above the randomization envelope, that is there are a lot of functional water points that intervent distances, as a result the curve climbs up very quickly. This suggests clustering.

7.1.2 Non-Functional Water Point

Computing G-function estimation

G_wp_nonfunctional = Gest(wp_nonfunctional_ppp, correction = "border")
plot(G_wp_nonfunctional, xlim=c(0,500))

Performing Complete Spatial Randomness Test

# eval = FALSE
G_wp_nonfunctional.csr <- envelope(wp_nonfunctional_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.

Done.
plot(G_wp_nonfunctional.csr)

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Non-Functional Water Points are clustered. Hence, we reject the null hypothesis that Non Functional Water Points are randomly distributed at 99% confident interval. Since the G(r) is above the randomization envelope, that is there are a lot of functional water points that intervent distances, as a result the curve climbs up very quickly. This suggests clustering.

8.0 Spatial Correlation Analysis

8.1 Data Pre-Processing

For this, we will be using the wp_sf_nga dataframe which has all the water points. We will first convert sf data frames to sp’s Spatial class

wp_spatial <- as_Spatial(wp_sf_nga)

Convert spatial class into generic sp class

wp_sp <- as(wp_spatial, "SpatialPoints")

Converting generic sp format into spatstat’s ppp format

wp_ppp <- as(wp_sp, "ppp")
wp_ppp
Planar point pattern: 5557 points
window: rectangle = [177285.9, 291287.05] x [340054.1, 450859.7] units
plot(wp_ppp)

For this analysis, we will be working with marked data, and we know that the values are categorical (different water points), we need to ensure that the marked field is of factor data type. However, as seen from the output, our status_clean field is of chr data type, not factor! Hence, we will use the as.factor() function:

wp_spatial@data$status_clean <-as.factor(wp_spatial@data$status_clean)

We then convert our spatial data datafram intto ppp format and create an owin object.

wp_spatial_marked_ppp <- as(wp_spatial, "ppp")
wp_spatial_marked_ppp = wp_spatial_marked_ppp[NGA_owin]
plot(wp_spatial_marked_ppp, which.marks = "status_clean")

We use the density() to compute the kernel density objects, and the use plot() to plot it out. Further, we convert the meter to kilometers using rescale()

plot((density(split(rescale(wp_spatial_marked_ppp, 1000)))))

Before we proceed with our second order spatial analysis, we need to assign marks to the ppp objects + check the levels of the marks:

levels(marks(wp_spatial_marked_ppp))
[1] "Abandoned"                        "Abandoned/Decommissioned"        
[3] "Functional"                       "Functional but needs repair"     
[5] "Functional but not in use"        "Non-Functional"                  
[7] "Non-Functional due to dry season" "unknown"                         

As we can see from the marks, and the above density maps, there are 7 levels here. However, we need them to be only classified into 3 - Functional, Non-Functional and unknown. Hence, we will be using the below code to rename the variable.

levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Abandoned"] <- "Non-Functional"
levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Abandoned/Decommissioned"] <- "Non-Functional"
levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Non-Functional due to dry season"] <- "Non-Functional"
levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Functional but needs repair"] <- "Functional"
levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Functional but not in use"] <- "Functional"

Now, upon running the below code, we can see that all the levels are categorized into our 3 desired levels.

levels(wp_spatial_marked_ppp[["marks"]])
[1] "Non-Functional" "Functional"     "unknown"       
plot(wp_spatial_marked_ppp, which.marks = "status_clean")

plot(density(split(wp_spatial_marked_ppp)))

From the above density maps, we can observe a relationship between Functional and Non-Functional water points - in the sense that the there seems to be a lot of water Functional water points where ever there are Non-Functional water points. They seem to coexist together quite a lot of times, and hence indicating some level of dependence between them. Hence, to investigate this, we will be using the Cross K-Function.

  • H0: The spatial distribution of Functional and Non-Functional water points are spatially independent

  • H1: The spatial distribution of Functional and Non-Functional water points are not spatially independent

  • Confidence Level: 99%

  • Significance Level: 0.01%

The null-hypothesis will be rejected if the p-value is smaller than alpha value of 0.01.

# eval = FALSE
wp_spatial_marked_ppp_Lcross.csr <- envelope(wp_spatial_marked_ppp, 
                                 Lcross, 
                                 i="Functional", 
                                 j="Non-Functional", 
                                 correction="border", 
                                 nsim=999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 4:31] .........20 [etd 4:06] .........
30 [etd 4:13] .........40 [etd 4:13] .........50 [etd 4:08] ........
.60 [etd 4:13] .........70 [etd 4:05] .........80 [etd 4:02] .......
..90 [etd 3:56] .........100 [etd 3:51] .........110 [etd 3:49] ......
...120 [etd 3:47] .........130 [etd 3:44] .........140 [etd 3:39] .....
....150 [etd 3:37] .........160 [etd 3:32] .........170 [etd 3:31] ....
.....180 [etd 3:28] .........190 [etd 3:25] .........200 [etd 3:23] ...
......210 [etd 3:20] .........220 [etd 3:17] .........230 [etd 3:16] ..
.......240 [etd 3:12] .........250 [etd 3:10] .........260 [etd 3:08] .
........270 [etd 3:06] .........280 [etd 3:03] .........290
 [etd 3:02] .........300 [etd 3:02] .........310 [etd 3:03] .........
320 [etd 3:00] .........330 [etd 2:57] .........340 [etd 2:55] ........
.350 [etd 2:53] .........360 [etd 2:51] .........370 [etd 2:48] .......
..380 [etd 2:46] .........390 [etd 2:43] .........400 [etd 2:41] ......
...410 [etd 2:37] .........420 [etd 2:36] .........430 [etd 2:34] .....
....440 [etd 2:31] .........450 [etd 2:28] .........460 [etd 2:26] ....
.....470 [etd 2:23] .........480 [etd 2:20] .........490 [etd 2:17] ...
......500 [etd 2:14] .........510 [etd 2:11] .........520 [etd 2:09] ..
.......530 [etd 2:06] .........540 [etd 2:04] .........550 [etd 2:01] .
........560 [etd 1:59] .........570 [etd 1:56] .........580
 [etd 1:53] .........590 [etd 1:50] .........600 [etd 1:47] .........
610 [etd 1:44] .........620 [etd 1:42] .........630 [etd 1:40] ........
.640 [etd 1:37] .........650 [etd 1:35] .........660 [etd 1:32] .......
..670 [etd 1:30] .........680 [etd 1:27] .........690 [etd 1:25] ......
...700 [etd 1:22] .........710 [etd 1:19] .........720 [etd 1:17] .....
....730 [etd 1:14] .........740 [etd 1:11] .........750 [etd 1:08] ....
.....760 [etd 1:06] .........770 [etd 1:03] .........780 [etd 1:01] ...
......790 [etd 58 sec] .........800 [etd 55 sec] .........810 [etd 53 sec] ..
.......820 [etd 50 sec] .........830 [etd 47 sec] .........840 [etd 44 sec] .
........850 [etd 41 sec] .........860 [etd 38 sec] .........870
 [etd 36 sec] .........880 [etd 33 sec] .........890 [etd 30 sec] .........
900 [etd 27 sec] .........910 [etd 24 sec] .........920 [etd 22 sec] ........
.930 [etd 19 sec] .........940 [etd 16 sec] .........950 [etd 13 sec] .......
..960 [etd 11 sec] .........970 [etd 8 sec] .........980 [etd 5 sec] ......
...990 [etd 2 sec] ........ 999.

Done.
plot(wp_spatial_marked_ppp_Lcross.csr, xlab="distance(km)", xlim=c(0,500))

Conclusion : The Functional and Non-Functional water point are not statistically independent as the empirical k-cross line is outside of the envelope of 99% confidence level, and for that we reject the null hypothesis. This lends weight to what we noticed in our previous maps, where the location of the Functional and Non-Functional water points seem to coincide. And hence, the relation between spatial distribution of Functional and Non-Functional water points is not independent.

8.2 Colocation of Functional and Non-Functional Water Points

Additionally, to understand the spatial correlation, we will be finding the colocation of functional and non-functional water point. The ‘unknown’ water points are not required, and hence, we use the filter() to remove them.

wp_sf_withoutUnknown <- wp_sf_nga %>%  filter(!status_clean=='unknown')

In the code chunk below, st_knn() of sfdep package is used to determine the k (i.e. 6) nearest neighbors for given point geometry. The function st_kernel_weights() is used to derive a weights list by using a kernel function. To compute LCLQ, the reference point data must be in either character or vector list. We create two vector lists - one of Functional and for Non-Functional water point and are called A and B respectively. The code local_colocation() is used to compute the LCLQ values for each water point. Before we can plot the LCLQ values their p-values, we need to join the output of local_colocation() to the stores sf data.frame.

nb <- include_self(
  st_knn(st_geometry(wp_sf_withoutUnknown),6))

wt <- st_kernel_weights(nb,
                        wp_sf_withoutUnknown,
                        "gaussian",
                        adaptive = TRUE)

A <- wp_functional$status_clean

B <- wp_nonfunctional$status_clean

LCLQ <- local_colocation(A, B, nb, wt, 49)

LCLQ_stores <- cbind(wp_sf_withoutUnknown, LCLQ)

The below map plots the LCLQ analysis

tmap_mode("view")
tm_shape(NGA) + 
  tm_polygons() + 
  tm_shape(LCLQ_stores)+
    tm_dots(col = "Non.Functional",
            size = 0.05,
            border.col = "grey",
            border.lwd = 0.5) + 
  tm_view(set.zoom.limits = c(8,11))
tmap_mode("view")
tm_shape(NGA) + 
  tm_polygons() + 
  tm_shape(LCLQ_stores)+
    tm_dots(col = "Abandoned",
            size = 0.05,
            border.col = "grey",
            border.lwd = 0.5) + 
  tm_view(set.zoom.limits = c(9,13))
tmap_mode("view")
tm_shape(NGA) + 
  tm_polygons() + 
  tm_shape(LCLQ_stores)+
    tm_dots(col = "Non.Functional.due.to.dry.season",
            size = 0.05,
            border.col = "grey",
            border.lwd = 0.5) + 
  tm_view(set.zoom.limits = c(9,11))
tmap_mode("view")
tm_shape(NGA) + 
  tm_polygons() + 
  tm_shape(LCLQ_stores)+
    tm_dots(col = "Abandoned.Decommissioned",
            size = 0.05,
            border.col = "grey",
            border.lwd = 0.5) + 
  tm_view(set.zoom.limits = c(9,11))

The above maps show the colocation of Functional Water Points and Non-Functional Water Points. This means that Non-functional water points in color (‘Non-Functional’, ‘Abandoned’, ‘Non-functional due to dry season’) are surrounded by several Functional water points and hence, are colocated. The above code evaluates each Non-Functional water point individually for colocation with the presence of Functional water points. Hence, if the number of Non-functional water points within the neighborhood of Functional water points are higher than the global proportion of Non-Functional water points, the colocation point will be higher. As we can see from the above maps, there are many Non-functional water points colocated with Functional water points compared to that of ‘Abandoned’ or ‘Non-functional due to dry season’.

9.0 References

  1. Professor Kam Tin Seong’s Hands-on Excercises

  2. G - Function Point Patter Analysis

  3. Selection of bandwidth type and adjustment side in kernel density estimation over inhomogeneous backgrounds (Paper)

  4. How IDW works (Reference)